NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Zheng, Z; Peng, P; Ma, Z; Chen, X; Choi, E; Harwath, D (May 2025, https://doi.org/10.48550/arXiv.2402.01591)

Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing datasets of in-the-wild spatial sounds, we synthesized a binaural audio dataset using AudioSet and SoundSpaces 2.0. Next, we developed SpatialSoundQA, a spatial sound-based question-answering dataset, offering a range of QA tasks that train BAT in various aspects of spatial sound perception and reasoning. The acoustic front end encoder of BAT is a novel spatial audio encoder named Spatial Audio Spectrogram Transformer, or Spatial-AST, which by itself achieves strong performance across sound event detection, spatial localization, and distance estimation. By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment. Our experiments demonstrate BAT's superior performance on both spatial sound perception and reasoning, showcasing the immense potential of LLMs in navigating and interpreting complex spatial audio environments.
more » « less
Full Text Available
Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations

Sun, X.; Zheng, Z. (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Federated Q-Learning: Linear Regret Speedup with Low Communication Cost

Zheng, Z; Gao, F; Xue, L; Yang, J (May 2024, The Twelfth International Conference on Learning Representations)

Full Text Available
Robust Q-Learning against State Perturbations: a Belief-Enriched Pessimistic Approach

Sun, X.; Zheng, Z. (December 2023, NeurIPS Workshop on Multi-Agent Security: Security as Key to AI Safety (MASEC))
Pandering in a (Flexible) Representative Democracy

Sun, X.; Masur, J.; Abramowitz, B.; Mattei, N.; Zheng, Z. (July 2023, Conference on Uncertainty in Artificial Intelligence (UAI))

Full Text Available
Online Learning for Adaptive Probing and Scheduling in Dense WLANs

Xu, T.; Zhang, D.; Zheng, Z. (April 2023, IEEE International Conference on Computer Communications (INFOCOM))

Full Text Available
Learning to Backdoor Federated Learning

Li, H.; Wu, C.; Zhu, S.; Zheng, Z. (May 2023, ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning (BANDS))

Full Text Available
A First Order Meta Stackelberg Method for Robust Federated Learning

Pan, Y.; Li, T.; Li, H.; Xu, T.; Zhu Q.; Zheng, Z. (July 2023, ICML Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers'23))

Full Text Available
Does Delegating Votes Protect Against Pandering Candidates? (Extended Abstract)

Sun, X.; Masur, J.; Abramowitz, B.; Mattei, N.; Zheng, Z. (May 2023, International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS))

Full Text Available
Towards Optimal Tradeoff Between Data Freshness and Update Cost in Information-update Systems

https://doi.org/10.1109/ICCCN54977.2022.9868923

Liu, Z.; Li, B.; Zheng, Z.; Hou, Y. T.; Ji, B. (April 2023, IEEE internet of things journal)

Full Text Available

« Prev Next »

Search for: All records